Learning to Fool the Speaker Recognition
نویسندگان
چکیده
Due to the widespread deployment of fingerprint/face/speaker recognition systems, risk in these especially adversarial attack, has drawn increasing attention recent years. Previous researches mainly studied attack vision-based such as fingerprint and face recognition. While for speech-based systems not been well yet, although it widely used our daily life. In this article, we attempt fool state-of-the-art speaker model present attacker , a lightweight multi-layer convolutional neural network well-trained by adding imperceptible perturbations onto raw speech waveform. We find that system is vulnerable achieve high success rate on both non-targeted targeted attack. Besides, an effective method leveraging pretrained phoneme optimize obtain tradeoff between perceptual quality. Experimental results TIMIT LibriSpeech datasets demonstrate effectiveness efficiency proposed model. And experiments frequency analysis indicate high-frequency more than low-frequency which different from conclusion previous image-based works. Additionally, ablation study gives insights into
منابع مشابه
Using Genetic Algorithm to “Fool” HMAX Object Recognition Model
HMAX ("Hierarchical Model and X") system is among the best machine vision approaches developed today, in many object recognition tasks [1]. HMAX decomposes an image into features which are passed to a classifier. These features each capture information about a small section of the input image but might not have information about the overall structure of the image if there is not a significant n...
متن کاملCollaborative Learning for Language and Speaker Recognition
This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demo...
متن کاملnative speaker norms and teaching english to non_native speakers : the case of iranian efl learners
امروزه، این که زبان انگلیسی سریع ترین و گسترده ترین زبان مورد استفاده در سراسر جهان است به عنوان یک واقعیت پذیرفته شده است. استفاده مشترک از زبان انگلیسی به عنوان یک زبان بین المللی مستلزم هنجارها و مدل های یادگیری و تدریس زبان است. زبان شناسان توجه ویژه ای به مفهوم "زبان مادری" به عنوان تنها منبع درست و قابل اعتماد از داده های زبان می داده اند.با این حال، این اصطلاح به اندازه کافی روشن به نظر ...
15 صفحه اولScalable learning for geostatistics and speaker recognition
With improved data acquisition methods, the amount of data that is being collected has increased several fold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. My research focused on developi...
متن کاملLearning statistically efficient features for speaker recognition
We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for a speaker. The basis functions learned by the algorithm are oriented and localized in both space and frequency, bearing a resemblance to Gabor functions. The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distributio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications
سال: 2021
ISSN: ['1551-6857', '1551-6865']
DOI: https://doi.org/10.1145/3468673